Goto

Collaborating Authors

 gradient penalty


Neglected Hessian component explains mysteries in sharpness regularization

Neural Information Processing Systems

Recent work has shown that methods that regularize second order information like SAM can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We investigate this inconsistency and reveal its connection to the the structure of the Hessian of the loss. Specifically, its decomposition into the positive semi-definite Gauss-Newton matrix and an indefinite matrix, which we call the Nonlinear Modeling Error (NME) matrix. Previous studies have largely overlooked the significance of the NME in their analysis for various reasons. However, we provide empirical and theoretical evidence that the NME is important to the performance of gradient penalties and explains their sensitivity to activation functions. We also provide evidence that the difference in regularization performance between gradient penalties and weight noise can be explained by the NME. Our findings emphasize the necessity of considering the NME in both experimental design and theoretical analysis for sharpness regularization.


Neglected Hessian component explains mysteries in sharpness regularization

Neural Information Processing Systems

SAM can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We investigate this inconsistency and reveal its connection to the the structure of the Hessian of the loss.




How to Train Your Latent Control Barrier Function: Smooth Safety Filtering Under Hard-to-Model Constraints

arXiv.org Artificial Intelligence

Latent safety filters extend Hamilton-Jacobi (HJ) reachability to operate on latent state representations and dynamics learned directly from high-dimensional observations, enabling safe visuo-motor control under hard-to-model constraints. However, existing methods implement "least-restrictive" filtering that discretely switch between nominal and safety policies, potentially undermining the task performance that makes modern visuomotor policies valuable. While reach-ability value functions can, in principle, be adapted to be control barrier functions (CBFs) for smooth optimization-based filtering, we theoretically and empirically show that current latent-space learning methods produce fundamentally incompatible value functions. We identify two sources of incompatibility: First, in HJ reachability, failures are encoded via a "margin function" in latent space, whose sign indicates whether or not a latent is in the constraint set. However, representing the margin function as a classifier yields saturated value functions that exhibit discontinuous jumps. We prove that the value function's Lipschitz constant scales linearly with the margin function's Lipschitz constant, revealing that smooth CBFs require smooth margins. Second, reinforcement learning (RL) approximations trained solely on safety policy data yield inaccurate value estimates for nominal policy actions, precisely where CBF filtering needs them. We propose the LatentCBF, which addresses both challenges through gradient penalties that lead to smooth margin functions without additional labeling, and a value-training procedure that mixes data from both nominal and safety policy distributions. Experiments on simulated benchmarks and hardware with a vision-based manipulation policy demonstrate that LatentCBF enables smooth safety filtering while doubling the task-completion rate over prior switching methods.



Neglected Hessian component explains mysteries in sharpness regularization

Neural Information Processing Systems

SAM can improve generalization in deep learning. Seemingly similar methods like weight noise and gradient penalties often fail to provide such benefits. We investigate this inconsistency and reveal its connection to the the structure of the Hessian of the loss.


NS-Pep: De novo Peptide Design with Non-Standard Amino Acids

arXiv.org Artificial Intelligence

Peptide drugs incorporating non-standard amino acids (NSAAs) offer improved binding affinity and improved pharmacological properties. However, existing peptide design methods are limited to standard amino acids, leaving NSAA-aware design largely unexplored. We introduce NS-Pep, a unified framework for co-designing peptide sequences and structures with NSAAs. The main challenge is that NSAAs are extremely underrepresented-even the most frequent one, SEP, accounts for less than 0.4% of residues-resulting in a severe long-tailed distribution. To improve generalization to rare amino acids, we propose Residue Frequency-Guided Modification (RFGM), which mitigates over-penalization through frequency-aware logit calibration, supported by both theoretical and empirical analysis. Furthermore, we identify that insufficient side-chain modeling limits geometric representation of NSAAs. To address this, we introduce Progressive Side-chain Perception (PSP) for coarse-to-fine torsion and location prediction, and Interaction-Aware Weighting (IAW) to emphasize pocket-proximal residues. Moreover, NS-Pep generalizes naturally to the peptide folding task with NSAAs, addressing a major limitation of current tools. Experiments show that NS-Pep improves sequence recovery rate and binding affinity by 6.23% and 5.12%, respectively, and outperforms AlphaFold3 by 17.76% in peptide folding success rate.